Search results for "Statistically validated network"
showing 10 items of 18 documents
Backbone of credit relationships in the Japanese credit market
2016
We detect the backbone of the weighted bipartite network of the Japanese credit market relationships. The backbone is detected by adapting a general method used in the investigation of weighted networks. With this approach we detect a backbone that is statistically validated against a null hypothesis of uniform diversification of loans for banks and firms. Our investigation is done year by year and it covers more than thirty years during the period from 1980 to 2011. We relate some of our findings with economic events that have characterized the Japanese credit market during the last years. The study of the time evolution of the backbone allows us to detect changes occurred in network size,…
Statistically Validated Networks for assessing topic quality in LDA models
2022
Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution overwords characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be only characterized by a set of irrelevant or unchained words, being useless for the interpretation. Although many topic-quality metrics were proposed (Newman et al., 2009; Alet…
Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches
2022
Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …
MEASURING TOPIC COHERENCE THROUGH STATISTICALLY VALIDATED NETWORKS
2020
Topic models arise from the need of understanding and exploring large text document collections and predicting their underlying structure. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) has quickly become one of the most popular text modelling techniques. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models give no guaranty on the interpretability of their outputs. The topics learned from texts may be characterized by a set of irrelevant or unchained words. Therefore, topic models require validation of the coherence of estimated topics. However, the automatic evaluation …
Ranking coherence in topic models using statistically validated networks
2023
Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…
Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data
2014
Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, gen…
Patterns of trading profiles at the Nordic Stock Exchange. A correlation-based approach.
2016
We investigate the trading behavior of Finnish individual investors trading the stocks selected to compute the OMXH25 index in 2003 by tracking the individual daily investment decisions. We verify that the set of investors is a highly heterogeneous system under many aspects. We introduce a correlation based method that is able to detect a hierarchical structure of the trading profiles of heterogeneous individual investors. We verify that the detected hierarchical structure is highly overlapping with the cluster structure obtained with the approach of statistically validated networks when an appropriate threshold of the hierarchical trees is used. We also show that the combination of the cor…
STRANIERI, MERIDIONALI O PROVINCIALI? I CONSUMI NEL TEMPO LIBERO DELLE SECONDE GENERAZIONI
2022
In this paper, we analyze consumption patterns of leisure time among young people belonging to the so-called “second generation” of immigrants in Italy. Leisure time consumption describes how young immigrants use cultural products and services. We analyze data collected by the ISTAT through the survey on the “second generations” (2015). A comparison of leisure consumption patterns between second-generation immigrants and their Italian peers does not show significant differences. Rather, differences in consumption styles are associated to gender (male/female), geographic area of residence (North/South), and size of the municipality (large municipality/small municipality) of residence.
Emergence of statistically validated financial intraday lead-lag relationships
2014
According to the leading models in modern finance, the presence of intraday lead-lag relationships between financial assets is negligible in efficient markets. With the advance of technology, however, markets have become more sophisticated. To determine whether this has resulted in an improved market efficiency, we investigate whether statistically significant lagged correlation relationships exist in financial markets. We introduce a numerical method to statistically validate links in correlation-based networks, and employ our method to study lagged correlation networks of equity returns in financial markets. Crucially, our statistical validation of lead-lag relationships accounts for mult…